Self-supervised Regularization for Text Classification
نویسندگان
چکیده
Abstract Text classification is a widely studied problem and has broad applications. In many real-world problems, the number of texts for training models limited, which renders these prone to overfitting. To address this problem, we propose SSL-Reg, data-dependent regularization approach based on self-supervised learning (SSL). SSL (Devlin et al., 2019a) an unsupervised that defines auxiliary tasks input data without using any human-provided labels learns representations by solving tasks. supervised task are performed simultaneously. The unsupervised, defined purely human- provided labels. Training model can prevent from being overfitted limited class in task. Experiments 17 text datasets demonstrate effectiveness our proposed method. Code available at https://github.com/UCSD-AI4H/SSReg.
منابع مشابه
Semi-supervised learning for text classification using feature affinity regularization
Most conventional semi-supervised learning methods attempt to directly include unlabeled data into training objectives. This paper presents an alternative approach that learns feature affinity information from unlabeled data, which is incorporated into the training objective as regularization of a maximum entropy model. The regularization favors models for which correlated features have similar...
متن کاملSoft-Supervised Learning for Text Classification
We propose a new graph-based semisupervised learning (SSL) algorithm and demonstrate its application to document categorization. Each document is represented by a vertex within a weighted undirected graph and our proposed framework minimizes the weighted Kullback-Leibler divergence between distributions that encode the class membership probabilities of each vertex. The proposed objective is con...
متن کاملVariational Autoencoder for Semi-Supervised Text Classification
Although semi-supervised variational autoencoder (SemiVAE) works in image classification task, it fails in text classification task if using vanilla LSTM as its decoder. From a perspective of reinforcement learning, it is verified that the decoder’s capability to distinguish between different categorical labels is essential. Therefore, Semi-supervised Sequential Variational Autoencoder (SSVAE) ...
متن کاملSprinkling Topics for Weakly Supervised Text Classification
Supervised text classification algorithms require a large number of documents labeled by humans, that involve a laborintensive and time consuming process. In this paper, we propose a weakly supervised algorithm in which supervision comes in the form of labeling of Latent Dirichlet Allocation (LDA) topics. We then use this weak supervision to “sprinkle” artificial words to the training documents...
متن کاملA Supervised Clustering Method for Text Classification
This paper describes a supervised three-tier clustering method for classifying students’ essays of qualitative physics in the Why2-Atlas tutoring system. Our main purpose of categorizing text in our tutoring system is to map the students’ essay statements into principles and misconceptions of physics. A simple `bag-of-words’ representation using a naïve-bayes algorithm to categorize text was un...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Transactions of the Association for Computational Linguistics
سال: 2021
ISSN: ['2307-387X']
DOI: https://doi.org/10.1162/tacl_a_00389